In [1]:
import pandas as pd
import seaborn as sns
import plotly.express as px

import matplotlib.pyplot as plt
In [2]:
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"

Matplotlib

For this excercise, we have written the following code to load the stock dataset built into plotly express.

In [3]:
stocks = px.data.stocks() #I changed the index
stocks.head()
Out[3]:
date GOOG AAPL AMZN FB NFLX MSFT
0 2018-01-01 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 2018-01-08 1.018172 1.011943 1.061881 0.959968 1.053526 1.015988
2 2018-01-15 1.032008 1.019771 1.053240 0.970243 1.049860 1.020524
3 2018-01-22 1.066783 0.980057 1.140676 1.016858 1.307681 1.066561
4 2018-01-29 1.008773 0.917143 1.163374 1.018357 1.273537 1.040708

Question 1:

Select a stock and create a suitable plot for it. Make sure the plot is readable with relevant information, such as date, values.

In [4]:
plt.figure(figsize=(15,10))
plt.plot(stocks["AMZN"])
plt.title("Amazon stock")
plt.xticks([0,14,28,42,56,70,84,98])
plt.xlabel("date");
plt.ylabel("Stock value");

Question 2:

You've already plot data from one stock. It is possible to plot multiples of them to support comparison.
To highlight different lines, customise line styles, markers, colors and include a legend to the plot.

In [5]:
ST=["GOOG","AAPL","AMZN","FB","NFLX","MSFT"]
plt.figure(figsize=(15,10))
for i in range(len(ST)):
    plt.plot(stocks[ST[i]])
plt.title("Stocks")
plt.xticks([0,14,28,42,56,70,84,98])
plt.xlabel("Date");
plt.legend(["Google","Apple","Amazon","FaceBook","Netflix","Microsoft"])
plt.ylabel("Stock value");

Seaborn

First, load the tips dataset

In [6]:
tips = sns.load_dataset('tips')
tips.head()
Out[6]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Question 3:

Let's explore this dataset. Pose a question and create a plot that support drawing answers for your question.

Some possible questions:

  • Are there differences between male and female when it comes to giving tips?
  • What attribute correlate the most with tip?
In [14]:
#Do female tip more than males during lunch or Dinner
plt.figure(figsize=(15,10))
plt.subplot(121)
tips['Tip_per_bill'] = tips["tip"]/tips["total_bill"]*100
Dinner = tips[tips["time"].str.contains('Dinner')]
Lunch = tips[tips["time"].str.contains('Lunch')]
Dinner.head()
sns.boxplot(data=(Dinner), x="sex", y="Tip_per_bill")
plt.ylim(0, 75)
plt.title("Diner")
plt.ylabel("Percentage tip as part of the total bill")
plt.subplot(122)
sns.boxplot(data=(Lunch), x="sex", y="Tip_per_bill")
plt.title("Lunch")
plt.ylim(0, 75)
plt.ylabel("Percentage tip as part of the total bill");

Answer

For dinner the average female tipping is higger than males.

For lunch the males do tip slightly higger

Plotly Express

Question 4:

Redo the above exercises (challenges 2 & 3) with plotly express. Create diagrams which you can interact with.

The stocks dataset

Hints:

  • Turn stocks dataframe into a structure that can be picked up easily with plotly express
In [8]:
ST=["GOOG","AAPL","AMZN","FB","NFLX","MSFT"]
   
Google=pd.DataFrame(stocks[["date","GOOG"]])
GG = Google.rename(columns={"GOOG":"stock value"})
GG["Company"] = "Google"
Apple=pd.DataFrame(stocks[["date","AAPL"]])
AA = Apple.rename(columns={"AAPL":"stock value"})
AA["Company"] = "Apple"
Amazon=pd.DataFrame(stocks[["date","AMZN"]])
ZZ = Amazon.rename(columns={"AMZN":"stock value"})
ZZ["Company"] = "Amazon"
Facebook=pd.DataFrame(stocks[["date","FB"]])
FF = Facebook.rename(columns={"FB":"stock value"})
FF["Company"] = "Facebook"
Netflix=pd.DataFrame(stocks[["date","NFLX"]])
NN = Netflix.rename(columns={"NFLX":"stock value"})
NN["Company"] = "Netflix"
Microsoft=pd.DataFrame(stocks[["date","MSFT"]])
MM = Microsoft.rename(columns={"MSFT":"stock value"})
MM["Company"] = "Mircosoft"
frames=[GG, AA, ZZ,FF,NN,MM]
result = pd.concat(frames)
fig = px.line(result, x="date", y="stock value", color="Company")
fig.show();

The tips dataset

In [26]:
fig =px.box(tips, x="time", y="Tip_per_bill", color="sex", labels={"Tip_per_bill":"Percentage tip as part of the total bill"}, title="Do female tip more than males during lunch or Dinner")
fig.show()

Question 5:

Recreate the barplot below that shows the population of different continents for the year 2007.

Hints:

  • Extract the 2007 year data from the dataframe. You have to process the data accordingly
  • use plotly bar
  • Add different colors for different continents
  • Sort the order of the continent for the visualisation. Use axis layout setting
  • Add text to each bar that represents the population
In [35]:
#load data
df = px.data.gapminder()
df.head()
Out[35]:
country continent year lifeExp pop gdpPercap iso_alpha iso_num
0 Afghanistan Asia 1952 28.801 8425333 779.445314 AFG 4
1 Afghanistan Asia 1957 30.332 9240934 820.853030 AFG 4
2 Afghanistan Asia 1962 31.997 10267083 853.100710 AFG 4
3 Afghanistan Asia 1967 34.020 11537966 836.197138 AFG 4
4 Afghanistan Asia 1972 36.088 13079460 739.981106 AFG 4
In [76]:
seven = df[df["year"]==2007]
sev =seven.groupby(seven["continent"]).sum()
ves= sev.reset_index()
fig = px.bar(ves,y="continent", x="pop",color="continent", text="pop")
fig.update_yaxes(categoryorder="total ascending")



fig.show()